Simultaneous MultiStreaming for Complexity-Effective VLIW Architectures

نویسندگان

  • Pradeep Rao
  • S. K. Nandy
  • M. N. V. Satya Kiran
چکیده

Very Long Instruction Word (VLIW) architectures exploit instruction level parallelism (ILP) with the help of the compiler to achieve higher instruction throughput with minimal hardware. However, control and data dependencies between operations limit the available ILP, which not only hinders the scalability of VLIW architectures, but also result in code size expansion. Although speculation and predicated execution mitigate ILP limitations due to control dependencies to a certain extent, they increase hardware cost and exacerbate code size expansion. Simultaneous multistreaming (SMS) can significantly improve operation throughput by allowing interleaved execution of operations from multiple instruction streams. In this paper we study SMS for VLIW architectures and quantify the benefits associated with it using a case study of the MPEG-2 video decoder. We also propose the notion of virtual resources for VLIW architectures, which decouple architectural resources (resources exposed to the compiler) from the microarchitectural resources, to limit code size expansion. Our results for a VLIW architecture demonstrate that: (1) SMS delivers much higher throughput than that achieved by speculation and predicated execution, (2) the increase in performance due to the addition of speculation and predicated execution support over SMS averages around 12%. The minor increase in performance might not warrant the additional hardware complexity involved, and (3) the notion of virtual resources is very effective in reducing no-operations (NOPs) and consequently reduce code size with little or no impact on performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster Level Multithreading for VLIW Processors

Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and lowpower. However, the ILP inmost of the applications today is limited and discourages the design of wider issue processors. Simultaneous MultiThreading (SMT) is a well known technique to improve the resource utilization by exploiting thread level ILP. However, implementing SMT is not feasible for e...

متن کامل

Complexity Effective ASIP Architectures for Network Processing and Multimedia Acceleration

xiii 1 Processor Design 1 1.1 Technology Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Application Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Choice of Implementation Platforms . . . . . . . . . . . . . . . . . . . . . . 7 1.4 ASIP Design Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Complexity Effective Desi...

متن کامل

Reducing the complexity of instruction-level power models for VLIW processors

Aim of this paper is to propose a high-level power exploration framework based on an instruction-level energy model for VLIW (Very Long Instruction Word) architectures. More specifically, the present paper deals with the reduction of the complexity of the energy model of K -issue VLIW processors from exponential with respect to the number of operations within the Instruction Set O(|I S A|K ) to...

متن کامل

Preferred Strategies for Optimizing Convolution on VLIW DSP Architectures

1. Abstract Convolution is a central algorithm for implementing linear time invariant systems that constitute the heart of most digital signal processing algorithms. Performance on the linear convolution algorithm has been one of the primary benchmarks used to discern the performance of dedicated digital signal processing architectures (DSP). While DSP benchmarks are far more varied and complex...

متن کامل

A Study of Loop Unrolling for VLIW-based DSP Processors

With the growing popularity of DSPs and their associated applications, cost-effective software development has become a major issue. High-level language compilers are becoming more commonplace in the DSP world. While these compilers can generate correct code for DSP architectures, there remains considerable room for performance improvements. This paper addresses issues related to DSP compilatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003